Longest common substrings with k mismatches

نویسندگان

  • Tomás Flouri
  • Emanuele Giaquinta
  • Kassian Kobert
  • Esko Ukkonen
چکیده

The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the length of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics with k-mismatches of S1 and S2 in O(nm) time and O(m) space. Moreover, we also present a theoretical solution for the k = 1 case which runs in O((n + m) log(n + m)) time and uses O(n + m) space, improving over the existing O(nm) time and O(m) space bound of Babenko and

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ar X iv : 1 40 9 . 16 94 v 2 [ cs . D S ] 1 6 M ar 2 01 5 Longest common substrings with k mismatches

The longest common substring with k-mismatches problem is to find, given two strings S1 and S2, a longest substring A1 of S1 and A2 of S2 such that the Hamming distance between A1 and A2 is ≤ k. We introduce a practical O(nm) time and O(1) space solution for this problem, where n and m are the lengths of S1 and S2, respectively. This algorithm can also be used to compute the matching statistics...

متن کامل

kmacs: the k-Mismatch Avera- ge Common Substring Approach for Phylogeny Reconstruction

The vast majority of sequence comparison methods for phylogeny reconstruction rely on pairwise or multiple sequence alignments. These approaches are in practice not usable for longer sequences such as complete genomes. For this reason alignment-free methods have recently become more popular because they are much faster and usually computable in linear time. Some of these methods are based on re...

متن کامل

Efficient algorithms for the longest common subsequence in $k$-length substrings

Finding the longest common subsequence in k-length substrings (LCSk) is a recently proposed problem motivated by computational biology. This is a generalization of the well-known LCS problem in which matching symbols from two sequences A and B are replaced with matching non-overlapping substrings of length k from A and B. We propose several algorithms for LCSk, being non-trivial incarnations of...

متن کامل

kmacs: the k-mismatch average common substring approach to alignment-free sequence comparison

MOTIVATION Alignment-based methods for sequence analysis have various limitations if large datasets are to be analysed. Therefore, alignment-free approaches have become popular in recent years. One of the best known alignment-free methods is the average common substring approach that defines a distance measure on sequences based on the average length of longest common words between them. Herein...

متن کامل

Longest Common Subsequence in at Least k Length Order-Isomorphic Substrings

We consider the longest common subsequence (LCS) problem with the restriction that the common subsequence is required to consist of at least k length substrings. First, we show an O(mn) time algorithm for the problem which gives a better worst-case running time than existing algorithms, where m and n are lengths of the input strings. Furthermore, we mainly consider the LCS in at least k length ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Process. Lett.

دوره 115  شماره 

صفحات  -

تاریخ انتشار 2015